Goto

Collaborating Authors

 gradient-isolated learning


Putting An End to End-to-End: Gradient-Isolated Learning of Representations

Neural Information Processing Systems

We propose a novel deep learning method for local self-supervised representation learning that does not require labels nor end-to-end backpropagation but exploits the natural order in data instead. Inspired by the observation that biological neural networks appear to learn without backpropagating a global error signal, we split a deep neural network into a stack of gradient-isolated modules. Each module is trained to maximally preserve the information of its inputs using the InfoNCE bound from Oord et al [2018]. Despite this greedy training, we demonstrate that each module improves upon the output of its predecessor, and that the representations created by the top module yield highly competitive results on downstream classification tasks in the audio and visual domain. The proposal enables optimizing modules asynchronously, allowing large-scale distributed training of very deep neural networks on unlabelled datasets.


Reviews: Putting An End to End-to-End: Gradient-Isolated Learning of Representations

Neural Information Processing Systems

In the present manuscript the authors propose greedy InfoMax, a greedy algorithm which allows unsupervised learning in deep neural networks with state of the art performance. Specifically, the algorithm leverages implicit label information which is encoded temporally in the streaming data. Importantly, the present work rests on the shoulders and success of Contrastive Predictive Coding, but dispenses with end-to-end training entirely. Getting greedy layer-wise unsupervised learning to perform at such levels is quite impressive and will without doubt have an important impact on the community. The work is original and the quality of the writing and figures seems quite high. What I would have liked to see is a more in depth review of the precise data generation process.


Putting An End to End-to-End: Gradient-Isolated Learning of Representations

Neural Information Processing Systems

We propose a novel deep learning method for local self-supervised representation learning that does not require labels nor end-to-end backpropagation but exploits the natural order in data instead. Inspired by the observation that biological neural networks appear to learn without backpropagating a global error signal, we split a deep neural network into a stack of gradient-isolated modules. Each module is trained to maximally preserve the information of its inputs using the InfoNCE bound from Oord et al [2018]. Despite this greedy training, we demonstrate that each module improves upon the output of its predecessor, and that the representations created by the top module yield highly competitive results on downstream classification tasks in the audio and visual domain. The proposal enables optimizing modules asynchronously, allowing large-scale distributed training of very deep neural networks on unlabelled datasets.


Putting An End to End-to-End: Gradient-Isolated Learning of Representations

Löwe, Sindy, O', Connor, Peter, Veeling, Bastiaan

Neural Information Processing Systems

We propose a novel deep learning method for local self-supervised representation learning that does not require labels nor end-to-end backpropagation but exploits the natural order in data instead. Inspired by the observation that biological neural networks appear to learn without backpropagating a global error signal, we split a deep neural network into a stack of gradient-isolated modules. Each module is trained to maximally preserve the information of its inputs using the InfoNCE bound from Oord et al [2018]. Despite this greedy training, we demonstrate that each module improves upon the output of its predecessor, and that the representations created by the top module yield highly competitive results on downstream classification tasks in the audio and visual domain. The proposal enables optimizing modules asynchronously, allowing large-scale distributed training of very deep neural networks on unlabelled datasets.


r/MachineLearning - [1905.11786] Putting An End to End-to-End: Gradient-Isolated Learning of Representations

#artificialintelligence

In general agree, but in machine learning mutual information seems to be a case where approximation can help sometime rather than hurt. In another discussion this week about the Tishby information bottleneck cameldrv correctly said that the mutual information between a signal and its encrypted version should be high, but in practice no algorithm will discover this. But turn that around: when used in a complex DNN, a learning algorithm that seeks to maximize mutual information (such as today's putting-and-end-to-end-to-end) could in theory produce something like a weak encryption: the desired information is extracted, but it is in such a complex form that _another_ DNN classifier would be needed to extract it! So the fact that mutual information can only approximate can be a good thing, because this is prevented when optimizing objectives that cannot "see" complex relationships. A radical example is in the HSIC bottlneck paper where an approximation that is only monotonically related spontaneously produced on-hot classifications without any guidance.